Code Generation in Analytixus with Python and Jinja2: A Practical, Scalable Pattern

Audience: Analytixus users and Data Engineers

This article demonstrates how to generate SQL code from metadata in Analytixus using Python and Jinja2. We use a simple, understandable example (a CREATE TABLE statement) to show the pattern clearly. In real projects, you will modularize and extend this approach for more artifacts (views, DDL migrations, MERGE statements, CDC logic, and entire load pipelines). Treat everything below as an example to illustrate the principles; future modularization is both necessary and recommended.

Jinja-Documentation

Jinja2 at pypi


Why Python + Jinja2 in Analytixus?

  • Scalability: One metadata object → one code file; hundreds of objects → hundreds of files, automatically.
  • Consistency: A single source‑of‑truth template enforces conventions across all generated artifacts.
  • Speed of change: Update the template or type mapping once; regenerate all outputs.
  • Separation of concerns: Business metadata lives in Analytixus; rendering logic lives in templates and small utility functions.

The Stable Interface Contract in Analytixus

Analytixus calls your generator script in a standard way:

  • Method signature: generate(md: dict) -> str
  • Input: A single parameter md that contains the parsed JSON metadata.
  • Output: A single string that contains the generated code, which Analytixus writes to a file.
  • Constraint: Do not change the method name generate, the parameter contract, nor the idea of returning a string. Internals are up to you, as long as the method returns the expected string.

Project Structure (DBX-Demo-WWI)

We’ll use the following layout:

DBX-Demo-WWI (project-root)/
  Bronze/
      02_BronzeStoreWithPy.pyx      # contains the generate(md) function
  Jinja_templates/
      bronze_store_table.sql.jinja2
  Silver/
      ...

  • The generator lives in Bronze/02_BronzeStoreWithPy.pyx.
  • Templates live in Jinja_templates/ at the project root.
  • Silver (and other layers) will follow similar generation patterns later.

Example Metadata (from Analytixus)

We’ll generate a Databricks SQL CREATE TABLE for the WideWorldImporters source table Application.Cities using this metadata sample:

{
  "StartIndex": "53",
  "BraceOpenIndex": "188",
  "BraceCloseIndex": "963",
  "Id": "Source.WWI.Application.Cities",
  "SchemaName": "Application",
  "ObjectName": "Cities",
  "TableName": "Application.Cities",
  "ObjectType": "Table",
  "CatalogName": "bsh-sqldb-WideWorldImporters",
  "SourceName": "WWI",
  "SourceType": "SqlDatabase",
  "Column": [
    {
      "StartIndex": "209",
      "EndIndex": "297",
      "Id": "Source.WWI.Application.Cities.CityID",
      "ColumnName": "CityID",
      "Ordinal": "1",
      "DataType": "INT",
      "IsNullable": "0",
      "IsPK": "1",
      "IsBK": "0",
      "IsKey": "1",
      "BasisDataType": "INTEGER",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "304",
      "EndIndex": "397",
      "Id": "Source.WWI.Application.Cities.CityName",
      "ColumnName": "CityName",
      "Ordinal": "2",
      "DataType": "NVARCHAR",
      "Length": "50",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "ALPHANUMERIC",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "404",
      "EndIndex": "490",
      "Id": "Source.WWI.Application.Cities.StateProvinceID",
      "ColumnName": "StateProvinceID",
      "Ordinal": "3",
      "DataType": "INT",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "INTEGER",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "497",
      "EndIndex": "580",
      "Id": "Source.WWI.Application.Cities.Location",
      "ColumnName": "Location",
      "Ordinal": "4",
      "DataType": "GEOGRAPHY",
      "IsNullable": "1",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "GEOGRAPHY",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "587",
      "EndIndex": "681",
      "Id": "Source.WWI.Application.Cities.LatestRecordedPopulation",
      "ColumnName": "LatestRecordedPopulation",
      "Ordinal": "5",
      "DataType": "BIGINT",
      "IsNullable": "1",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "INTEGER",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "688",
      "EndIndex": "771",
      "Id": "Source.WWI.Application.Cities.LastEditedBy",
      "ColumnName": "LastEditedBy",
      "Ordinal": "6",
      "DataType": "INT",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "INTEGER",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "778",
      "EndIndex": "863",
      "Id": "Source.WWI.Application.Cities.ValidFrom",
      "ColumnName": "ValidFrom",
      "Ordinal": "7",
      "DataType": "DATETIME",
      "Length": "7",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "TIME",
      "IsTechnical": "0"
    },
    {
      "StartIndex": "870",
      "EndIndex": "952",
      "Id": "Source.WWI.Application.Cities.ValidTo",
      "ColumnName": "ValidTo",
      "Ordinal": "8",
      "DataType": "DATETIME",
      "Length": "7",
      "IsNullable": "0",
      "IsPK": "0",
      "IsBK": "0",
      "IsKey": "0",
      "BasisDataType": "TIME",
      "IsTechnical": "0"
    }
  ]
}

Notes:

  • Many numeric-like fields are strings (e.g., "Ordinal": "1"). Normalize them in code.
  • Some types have auxiliary fields like Length; compose NVARCHAR(50) before mapping to a Databricks-compatible type.

The Jinja2 Template

We render a Databricks SQL CREATE TABLE for the bronze “Store” layer. Technical columns appear first, then business columns from metadata, followed by a PK constraint.

CREATE TABLE IF NOT EXISTS `{{ schema }}`.`Store_{{ table_name }}`
(
    -- Technical Fields
{%- for col in technical_columns %}
    {% if loop.first %}{% else %},{% endif %}`{{ col.name }}` {{ col.data_type }}{% if not col.nullable %} NOT NULL{% endif %}{% if col.comment %} /*{{ col.comment }}*/{% endif %}
{%- endfor %}
    -- Custom Fields
{%- for col in business_columns %}
    , `{{ col.name }}` {{ col.data_type }}{% if not col.nullable %} NOT NULL{% endif %}{% if col.is_business_key %} /*IsBK*/{% elif col.comment %} /*{{ col.comment }}*/{% endif %}
{%- endfor %}
    -- PK-CONSTRAINTs
    ,CONSTRAINT `PK_{{ catalog }}_{{ schema }}_Store_{{ table_name }}` PRIMARY KEY ({% for pk in primary_key_columns %}{% if not loop.first %}, {% endif %}`{{ pk }}`{% endfor %})
);

Template tips:

  • Use loop.first to avoid trailing commas.
  • Keep comments to aid auditing (/*IsBK*/ etc.).
  • Adopt consistent naming conventions (e.g., Store_{{ table_name }}).

The generate(md) Function in Bronze/02_BronzeStoreWithPy.pyx

Below is a compact, end-to-end example that:

  • Normalizes incoming column metadata,
  • Maps SQL Server/generic types to Databricks-compatible types,
  • Loads the Jinja2 template from ../Jinja_templates,
  • Returns the SQL string.

This is an illustrative example. In real projects, modularize and extract reusable helpers.

from jinja2 import Environment, FileSystemLoader

def generate(md: dict) -> str:
    """Receives metadata, generates code and returns it as string."""
    # Prepare technical columns
    technical_columns = [
        {"name": "DWH_DEF_DATE", "data_type": "TIMESTAMP", "nullable": False, "comment": "IsTechnical"},
        {"name": "DWH_LOAD_ID", "data_type": "INT", "nullable": False, "comment": "IsTechnical"},
        {"name": "DWH_RECORD_ID", "data_type": "BIGINT GENERATED ALWAYS AS IDENTITY", "nullable": False, "comment": "IsTechnical"},
        {"name": "DWH_SOURCE_ID", "data_type": "INT", "nullable": False, "comment": "IsTechnical"},
        {"name": "DWH_SOURCETABLE", "data_type": "STRING", "nullable": True, "comment": "IsTechnical"}
    ]

    # Type mapping
    DATABRICKS_TYPE_MAPPING = { ... }  # as above
    def normalize_type_for_databricks(data_type: str) -> str: ...  # as above

    def _to_column(input_col: dict) -> dict:
        """Convert a column dictionary to a column definition dictionary."""
        return {
            "name": input_col["ColumnName"],
            "data_type": normalize_type_for_databricks(input_col["DataType"]),
            "nullable": bool(input_col["IsNullable"]),
            "is_business_key": bool(input_col["IsBK"]),
            "Ordinal": int(input_col["Ordinal"]),
        }

    business_columns = sorted([_to_column(in_col) for in_col in md["Column"]], key=lambda x: x["Ordinal"])

    # NOW GENERATE!
    env = Environment(loader=FileSystemLoader(r"C:\Temp\ProjectData\DBX-Demo-WWI\jinja_templates"))
    template = env.get_template("bronze_store_table.sql.jinja2")

    return template.render(
        catalog=md["CatalogName"],
        schema=md["SchemaName"],
        table_name=md["ObjectName"],
        technical_columns=technical_columns,
        business_columns=business_columns,
        primary_key_columns=["DWH_RECORD_ID"]
    )

Key points:

  • Uses relative path resolution to keep the build portable across environments.
  • Composes raw types (e.g., NVARCHAR(50)) before mapping to Databricks-compatible types.
  • Defaults complex types like GEOGRAPHY and XML to STRING for bronze ingestion; adjust for curated layers.

Example Output (from the provided metadata)

With the metadata above, the template produces the following Databricks SQL:

CREATE TABLE IF NOT EXISTS `Application`.`Store_Cities`
(
	-- Technical Fields
	`DWH_DEF_DATE` TIMESTAMP NOT NULL /*IsTechnical*/
	,`DWH_LOAD_ID` INT NOT NULL /*IsTechnical*/
	,`DWH_RECORD_ID` BIGINT GENERATED ALWAYS AS IDENTITY NOT NULL /*IsTechnical*/
	,`DWH_SOURCE_ID` INT NOT NULL /*IsTechnical*/
	,`DWH_SOURCETABLE` STRING /*IsTechnical*/
	-- Custom Fields
	, `CityID` None /*IsBK*/
	, `CityName` None /*IsBK*/
	, `StateProvinceID` None /*IsBK*/
	, `Location` None /*IsBK*/
	, `LatestRecordedPopulation` None /*IsBK*/
	, `LastEditedBy` None /*IsBK*/
	, `ValidFrom` None /*IsBK*/
	, `ValidTo` None /*IsBK*/
	-- PK-CONSTRAINTs
	,CONSTRAINT `PK_bsh-sqldb-WideWorldImporters_Application_Store_Cities` PRIMARY KEY (`DWH_RECORD_ID`)
);

Operational Flow in Analytixus

  • The build process selects metadata objects and calls generate(md) for each.
  • Your function returns a string; Analytixus writes each result to its destination file (e.g., based on object name and schema).
  • Determinism matters for reviews and CI: sort columns by Ordinal, not by name.

Testing and Validation

  • Add unit tests with representative metadata samples to verify:
    • Type normalization (e.g., NVARCHAR(50)STRING).
    • Precision/scale preservation for DECIMAL(38,18) (when present).
    • Defaults for complex types (GEOGRAPHY, XML) — choose strict vs. lenient behavior per layer.
  • Validate required fields early (e.g., SchemaName, ObjectName, Column presence). Fail fast on missing essentials to avoid silent mis-generation.

Extending Beyond CREATE TABLE

This pattern generalizes easily:

  • Generate views (silver/gold), MERGE statements, CDC logic, orchestration/job configs, and more.
  • Emit multiple files per object (e.g., DDL + DML + pipeline configuration).
  • The CREATE TABLE example is intentionally simple to demonstrate the approach. In real projects, modularize helpers, factor configurations, and introduce environment-aware conventions.

Best Practices and Pitfalls

  • Prefer relative template paths to avoid hard-coded strings and ensure portability.
  • Keep comments and naming conventions consistent and documented (e.g., Store_ prefix).
  • Separate technical vs. business columns clearly; later, make technical columns configurable when modularizing.
  • Make unknown types a conscious decision: warn or fail. Bronze layers often accept strings; curated layers should be stricter.
  • Preserve precision/scale when mapping numeric types.
  • Ensure deterministic output for clean diffs and reliable CI/CD.

Conclusion

Python + Jinja2 provides a clean, scalable way to transform Analytixus metadata into consistent, reviewable code. With a stable generate(md) -> str contract, robust type normalization, and deterministic templates, you can evolve from simple DDL generation to complete, metadata-driven pipelines across bronze, silver, and gold layers. This article’s code is a minimal example to illustrate the pattern; in production, modularize utilities, configurations, and tests to keep the system maintainable and flexible.

DnAML (Data and Analytics Markup Language): A Practical Guide for Analytixus

Data and analytics teams juggle multiple concerns: raw source onboarding, semantic enrichment, lineage tracking, dependency management, and consistent code generation across pipelines and layers. DnAML (Data and Analytics Markup Language) is a domain-specific, JSON-like metadata format designed to unify these concerns in a single, readable source of truth. This post explains what DnAML is, what you can do with it, why you might need it, and how to use it—plus small, copy-ready examples you can adapt for your Analytixus projects.


What is DnAML?

DnAML is a compact, human-readable language for semantically describing and orchestrating data sources, structures, and transformations in analytics solutions. It lets you centrally define and control data models along the processing chain—from raw ingestion through semantic modeling to analytical delivery. Conceptually, unsupervised metadata extracted from source systems is transformed into a curated DnAML model, forming the second stage in the metadata life cycle.

DnAML borrows a familiar, block-oriented structure reminiscent of JSON but adds its own semantics for:

  • Declaring tables and views
  • Modeling column types, nullability, defaults, keys, and free-form options
  • Defining lineage via ORIGINS with UNION/JOIN behavior
  • Expressing cross-object relationships via REFERENCES
  • Organizing models with SOURCES and FOLDER constructs

These core constructs are part of the language grammar and enable consistent, machine- and human-friendly modeling of data assets.


What can you do with DnAML?

  • Define data objects (tables, views) and their attributes with clear types, constraints, and options.
  • Document lineage with ORIGINS, composing objects from upstream OBJECTs using UNION or JOIN and explicit field mappings.
  • Specify relationships and matching rules through REFERENCES, including comparisons, ranges, constants, and optional expressions to annotate logic.
  • Organize models by SOURCE (systems) and FOLDER (domains or layers like Bronze, Silver, Gold) to reflect your architecture.
  • Capture operational metadata via free-form options in square brackets—for example, schema names, owners, synchronization hints, and more.

Why DnAML?

Teams need a consistent way to represent:

  • What an object contains (columns, types, constraints),
  • Where it comes from (lineage and mapping),
  • How it relates to other objects (references),
  • How the model is organized (sources, folders, layers).

By keeping all of this in one language, you gain:

  • Clarity and maintainability across pipeline stages
  • Versionability of models as code
  • A foundation for automation (generating DDL, ETL/ELT jobs, notebooks, or validation rules)
  • End-to-end traceability of attribute origins across processing steps

The DnAML grammar natively supports all of these aspects, aligning modeling, lineage, and relationships in a single definition.


How DnAML is used in practice

DnAML uses statements and blocks with consistent syntax:

  • Top-level statements include TABLE, VIEW, REFERENCES, SOURCES, and FOLDER.
  • Blocks use braces { ... }, entries are comma-separated, and semicolons are optional.
  • Identifiers can be dot-qualified (e.g., Application.Cities) or quoted when they include spaces or special characters (e.g., "Order Item").
  • Free-form options appear in square brackets [ ... ] as either bare terms or key: value pairs, where values can be identifiers, strings, or numbers.
  • Comments use /* ... */ and are ignored by parsers.

Workflow: Integrating XML and DnAML

In a typical metadata management workflow, XML and DnAML complement each other:

  • Unsupervised metadata: Automatically extracted metadata from source systems or code is initially stored in XML. This stage captures raw, uncurated information.
  • DnAML modeling: Unsupervised metadata is converted into a DnAML model, where it is structured, semantically enriched, and maintained. Here, you also define mappings between source and target structures.
  • Supervised metadata: Once curated and validated in DnAML, the model is persisted back to XML—but now as supervised metadata. This XML becomes the basis for automated processing, such as generating data structures, pipelines, or validation artifacts.

Examples

Below are small examples you can copy into your projects. They illustrate sources, tables, views, lineage, and options. Each example is presented in two code block variants for easy copying.

1) Declaring a source and a table with typed columns and options

Sources {
    Source WWI  {
        Table Application.Cities [SchemaName:Application,SourceName:wwierp] {
            Columns {
                CityID INT NOT NULL PRIMARY KEY [BasisDataType:INTEGER,Ordinal:1,IsKey:1,IsTechnical:0],
                CityName NVARCHAR(50) NOT NULL [BasisDataType:ALPHANUMERIC,Ordinal:2,IsKey:0,IsTechnical:0],
                ...
            }
        }
    }
}
  • Uses SOURCES and Source to group objects; Table includes a Columns block with types, nullability, primary key, and options.

2) Bronze-layer view with lineage (Origins) and semantic options

Folder Bronze { 
    View Bronze_Access.Application_Cities [SchemaName:'Bronze_Access'] {
        Columns {
            DWH_SOURCE_ID INT NOT NULL [BasisDataType:INTEGER, IsTechnical:1], 
            CityID INT NULL UNIQUE [BasisDataType:INTEGER, IsKey:1, OriginColumnName:'CityID', OriginId:'Bronze.Bronze_Store.Application_Cities.CityID'], 
            ... 
        }
        Origins { 
            Object "Bronze.Bronze_Store.Application_Cities" AS Stage [AliasName:Stage, SchemaName:'Bronze_Store'] 
        } 
    } 
}
  • Uses FOLDER to organize layer-specific models; View includes Columns and an Origins block with an OBJECT, aliased via AS, and free-form options.

3) Columns, defaults, identity, and constraints

Table Sales.Order [SchemaName:dbo] {
  Columns {
    OrderId INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    OrderDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CustomerId INT NOT NULL,
    Comment NVARCHAR(200),
    Total DECIMAL(18,2) NOT NULL
  }
}
  • Shows supported types, IDENTITY with seed and increment, NULL/NOT NULL, defaults, and constraints (PRIMARY KEY, UNIQUE).

4) Lineage with UNION/JOIN and field mappings

View rpt.ActiveOrders [SchemaName:dbo] {
  Columns { OrderId INT, CustomerId INT, Status VARCHAR(20) },
  Origins JOIN [on:OrderId] {
    Object erp.Orders AS o [system:'ERP'] {
      o.OrderId = rpt.ActiveOrders.OrderId
    },
    Object crm.OrderExtras AS e {
      e.OrderId = rpt.ActiveOrders.OrderId
    }
  }
}
  • ORIGINS declares composition via JOIN; OBJECT entries can have aliases and free-form options; mappings use SourceId = TargetId.

5) References (relationships) with rules

References [strict:true] {
  Reference FK_Order_Customer FOREIGNOBJECT:Sales.Order PRIMARYOBJECT:dbo.Customer [cascade:false] {
    CustomerId = Id,
    OrderDate BETWEEN MinDate AND MaxDate,
    Status = 'Active' [weight:10],
    'VIP' = Tag,
    CustomerId = Id EXPRESSION: 'Customer.Id equals Order.CustomerId'
  }
}
  • REFERENCES groups REFERENCE entries; supports comparisons between identifiers and constants, BETWEEN ranges, and EXPRESSION annotations to document logic.

Getting started

  • Model sources and organization:
  • Use SOURCES to declare upstream systems and FOLDERs to group objects by domain or layer (e.g., Bronze, Silver, Gold).
  • Define objects:
  • Create TABLE or VIEW blocks with COLUMNS, including types, nullability, identity, defaults, and constraints.
  • Add lineage:
  • Use ORIGINS with UNION/JOIN; add OBJECT entries with optional aliases and field mappings.
  • Describe relationships:
  • Use REFERENCES with REFERENCE rules for FK-to-PK matching, constants, ranges, and annotations.
  • Use options for metadata:
  • Add square-bracketed [ ... ] options anywhere to capture schema, ownership, precision, hints, etc.

Tips and best practices

  • Quote identifiers that contain spaces or special characters (e.g., "Order Item") to avoid parsing ambiguity.
  • Keep operational and semantic metadata in [ ... ] options; this separates core schema from auxiliary attributes without cluttering structure.
  • Be explicit about lineage behavior:
  • Choose UNION vs. JOIN in ORIGINS to document composition semantics clearly.
  • Use REFERENCES consistently:
  • FOREIGNOBJECT represents the foreign-key side; PRIMARYOBJECT the primary-key side.

Advanced: Expressions and operator precedence

DnAML supports expressions for defaults and rules. You can use identifiers, strings, numbers, or tuples; unary operators include NOT, plus/minus, bitwise NOT; binary operators include arithmetic, bitwise, comparison, AND/OR, LIKE/IN, and their NOT variants. Operator precedence ensures predictable evaluation of complex expressions.

DEFAULT (Amount + Tax) / 2
DEFAULT NOT (Status = 'Inactive')
  • Parentheses control evaluation order; compound tokens like NOT LIKE and NOT IN are treated as single operators in parsing.

Where DnAML fits in Analytixus

  • Bronze/Silver/Gold modeling:
  • Use FOLDERs to organize layers and Views/Tables to declare layer-specific schemas, lineage, and rules.
  • Automation:
  • Curated DnAML models can drive generation of DDL scripts, pipeline tasks, notebooks, or validation configurations.
  • Governance:
  • Semantic enrichment in DnAML plus supervised XML persistence enables transparent lineage and audit-ready documentation.

Summary

DnAML provides a unified, domain-specific language to describe your analytics models: sources, objects, attributes, lineage, and relationships. It is readable enough for collaborative design and precise enough for reliable automation. If you’re building or maintaining Data Lakehouse or analytic platforms, DnAML can serve as the backbone of your metadata-driven workflows—bridging raw extraction and curated, supervised models that power repeatable, scalable data engineering.

Lean Data Vault: A Pragmatic Approach to Data Modeling in the Lakehouse

Every data and analytics project (data lakehouse) needs a framework with best practices. This one is definitely worth reading if you’re looking for a data modeling approach that doesn’t overcomplicate things, focuses on data quality, and supports massive parallel processing.

Ping back: https://andyloewen.de/2025/09/17/lean-data-vault-a-pragmatic-approach-to-data-modeling-in-the-lakehouse/